Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 2956 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 300.3 KiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 13 |
|---|
monetary is highly correlated with qtde_invoices and 1 other fields | High correlation |
qtde_invoices is highly correlated with monetary and 2 other fields | High correlation |
qtde_items is highly correlated with monetary and 1 other fields | High correlation |
qtde_products is highly correlated with qtde_invoices | High correlation |
avg_ticket is highly correlated with qtde_returns and 1 other fields | High correlation |
qtde_returns is highly correlated with avg_ticket | High correlation |
avg_basket_size is highly correlated with avg_ticket | High correlation |
monetary is highly correlated with qtde_invoices and 3 other fields | High correlation |
recency_days is highly correlated with qtde_invoices | High correlation |
qtde_invoices is highly correlated with monetary and 3 other fields | High correlation |
qtde_items is highly correlated with monetary and 3 other fields | High correlation |
qtde_products is highly correlated with monetary and 3 other fields | High correlation |
avg_ticket is highly correlated with avg_unique_basket_size | High correlation |
avg_recency_days is highly correlated with frequency | High correlation |
frequency is highly correlated with avg_recency_days | High correlation |
avg_basket_size is highly correlated with monetary and 1 other fields | High correlation |
avg_unique_basket_size is highly correlated with qtde_products and 1 other fields | High correlation |
monetary is highly correlated with qtde_items and 1 other fields | High correlation |
qtde_invoices is highly correlated with qtde_items | High correlation |
qtde_items is highly correlated with monetary and 3 other fields | High correlation |
qtde_products is highly correlated with monetary and 1 other fields | High correlation |
avg_recency_days is highly correlated with frequency | High correlation |
frequency is highly correlated with avg_recency_days | High correlation |
avg_basket_size is highly correlated with qtde_items | High correlation |
avg_ticket is highly correlated with qtde_returns and 1 other fields | High correlation |
monetary is highly correlated with qtde_returns and 4 other fields | High correlation |
qtde_returns is highly correlated with avg_ticket and 5 other fields | High correlation |
avg_unique_basket_size is highly correlated with avg_basket_size | High correlation |
avg_basket_size is highly correlated with avg_ticket and 4 other fields | High correlation |
qtde_products is highly correlated with monetary and 3 other fields | High correlation |
qtde_invoices is highly correlated with monetary and 3 other fields | High correlation |
qtde_items is highly correlated with monetary and 4 other fields | High correlation |
avg_ticket is highly skewed (γ1 = 25.0178421) | Skewed |
frequency is highly skewed (γ1 = 25.05874785) | Skewed |
qtde_returns is highly skewed (γ1 = 23.49789957) | Skewed |
df_index has unique values | Unique |
customer_id has unique values | Unique |
recency_days has 33 (1.1%) zeros | Zeros |
qtde_returns has 1480 (50.1%) zeros | Zeros |
Reproduction
| Analysis started | 2021-08-18 15:56:07.854033 |
|---|---|
| Analysis finished | 2021-08-18 15:56:45.393421 |
| Duration | 37.54 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 2956 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2312.443505 |
| Minimum | 0 |
|---|---|
| Maximum | 5701 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 184.75 |
| Q1 | 924.5 |
| median | 2116.5 |
| Q3 | 3531.75 |
| 95-th percentile | 5025.5 |
| Maximum | 5701 |
| Range | 5701 |
| Interquartile range (IQR) | 2607.25 |
Descriptive statistics
| Standard deviation | 1552.846475 |
|---|---|
| Coefficient of variation (CV) | 0.6715175839 |
| Kurtosis | -1.0154754 |
| Mean | 2312.443505 |
| Median Absolute Deviation (MAD) | 1269 |
| Skewness | 0.3413956713 |
| Sum | 6835583 |
| Variance | 2411332.176 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 3005 | 1 | < 0.1% |
| 2990 | 1 | < 0.1% |
| 2993 | 1 | < 0.1% |
| 2994 | 1 | < 0.1% |
| 2995 | 1 | < 0.1% |
| 2996 | 1 | < 0.1% |
| 2999 | 1 | < 0.1% |
| 3001 | 1 | < 0.1% |
| 3002 | 1 | < 0.1% |
| Other values (2946) | 2946 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 5701 | 1 | |
| 5682 | 1 | |
| 5672 | 1 | |
| 5666 | 1 | |
| 5645 | 1 | |
| 5641 | 1 | |
| 5635 | 1 | |
| 5624 | 1 | |
| 5623 | 1 | |
| 5613 | 1 |
| Distinct | 2956 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15270.85555 |
| Minimum | 12347 |
|---|---|
| Maximum | 18287 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 12347 |
|---|---|
| 5-th percentile | 12618.25 |
| Q1 | 13801.25 |
| median | 15222 |
| Q3 | 16767.25 |
| 95-th percentile | 17964.25 |
| Maximum | 18287 |
| Range | 5940 |
| Interquartile range (IQR) | 2966 |
Descriptive statistics
| Standard deviation | 1717.660762 |
|---|---|
| Coefficient of variation (CV) | 0.112479668 |
| Kurtosis | -1.203433731 |
| Mean | 15270.85555 |
| Median Absolute Deviation (MAD) | 1486.5 |
| Skewness | 0.03069983665 |
| Sum | 45140649 |
| Variance | 2950358.492 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 17850 | 1 | < 0.1% |
| 17588 | 1 | < 0.1% |
| 14905 | 1 | < 0.1% |
| 16103 | 1 | < 0.1% |
| 14626 | 1 | < 0.1% |
| 14868 | 1 | < 0.1% |
| 18246 | 1 | < 0.1% |
| 17115 | 1 | < 0.1% |
| 16611 | 1 | < 0.1% |
| 15912 | 1 | < 0.1% |
| Other values (2946) | 2946 |
| Value | Count | Frequency (%) |
| 12347 | 1 | |
| 12348 | 1 | |
| 12352 | 1 | |
| 12356 | 1 | |
| 12358 | 1 | |
| 12359 | 1 | |
| 12360 | 1 | |
| 12362 | 1 | |
| 12364 | 1 | |
| 12370 | 1 |
| Value | Count | Frequency (%) |
| 18287 | 1 | |
| 18283 | 1 | |
| 18282 | 1 | |
| 18277 | 1 | |
| 18276 | 1 | |
| 18274 | 1 | |
| 18273 | 1 | |
| 18272 | 1 | |
| 18270 | 1 | |
| 18269 | 1 |
| Distinct | 2942 |
|---|---|
| Distinct (%) | 99.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2645.419838 |
| Minimum | 6.2 |
|---|---|
| Maximum | 272345.66 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 6.2 |
|---|---|
| 5-th percentile | 226.315 |
| Q1 | 556.4725 |
| median | 1064.525 |
| Q3 | 2245.6675 |
| 95-th percentile | 7087.02 |
| Maximum | 272345.66 |
| Range | 272339.46 |
| Interquartile range (IQR) | 1689.195 |
Descriptive statistics
| Standard deviation | 9989.458288 |
|---|---|
| Coefficient of variation (CV) | 3.776133431 |
| Kurtosis | 401.0034374 |
| Mean | 2645.419838 |
| Median Absolute Deviation (MAD) | 652.88 |
| Skewness | 17.73829894 |
| Sum | 7819861.04 |
| Variance | 99789276.88 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 951.54 | 2 | 0.1% |
| 1736.14 | 2 | 0.1% |
| 1133.25 | 2 | 0.1% |
| 695.42 | 2 | 0.1% |
| 308.32 | 2 | 0.1% |
| 490.22 | 2 | 0.1% |
| 296.55 | 2 | 0.1% |
| 1682.8 | 2 | 0.1% |
| 175.92 | 2 | 0.1% |
| 740.95 | 2 | 0.1% |
| Other values (2932) | 2936 |
| Value | Count | Frequency (%) |
| 6.2 | 1 | |
| 13.3 | 1 | |
| 15 | 1 | |
| 36.06 | 1 | |
| 43.08 | 1 | |
| 45 | 1 | |
| 52 | 1 | |
| 52.2 | 1 | |
| 61.65 | 1 | |
| 67.67 | 1 |
| Value | Count | Frequency (%) |
| 272345.66 | 1 | |
| 259657.3 | 1 | |
| 194550.79 | 1 | |
| 136626.96 | 1 | |
| 120193.93 | 1 | |
| 115727.35 | 1 | |
| 87716.48 | 1 | |
| 72882.09 | 1 | |
| 62078.12 | 1 | |
| 60402.22 | 1 |
| Distinct | 272 |
|---|---|
| Distinct (%) | 9.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 64.26488498 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 33 |
| Zeros (%) | 1.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 11 |
| median | 31 |
| Q3 | 81 |
| 95-th percentile | 242 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 70 |
Descriptive statistics
| Standard deviation | 77.81710432 |
|---|---|
| Coefficient of variation (CV) | 1.210880629 |
| Kurtosis | 2.780282763 |
| Mean | 64.26488498 |
| Median Absolute Deviation (MAD) | 26 |
| Skewness | 1.799627449 |
| Sum | 189967 |
| Variance | 6055.501724 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 99 | 3.3% |
| 4 | 87 | 2.9% |
| 2 | 85 | 2.9% |
| 3 | 85 | 2.9% |
| 8 | 76 | 2.6% |
| 10 | 67 | 2.3% |
| 9 | 66 | 2.2% |
| 7 | 66 | 2.2% |
| 17 | 64 | 2.2% |
| 22 | 55 | 1.9% |
| Other values (262) | 2206 |
| Value | Count | Frequency (%) |
| 0 | 33 | 1.1% |
| 1 | 99 | |
| 2 | 85 | |
| 3 | 85 | |
| 4 | 87 | |
| 5 | 43 | |
| 7 | 66 | |
| 8 | 76 | |
| 9 | 66 | |
| 10 | 67 |
| Value | Count | Frequency (%) |
| 373 | 2 | |
| 372 | 4 | |
| 371 | 1 | < 0.1% |
| 368 | 1 | < 0.1% |
| 366 | 4 | |
| 365 | 2 | |
| 364 | 1 | < 0.1% |
| 360 | 1 | < 0.1% |
| 359 | 1 | < 0.1% |
| 358 | 4 |
| Distinct | 54 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.720906631 |
| Minimum | 1 |
|---|---|
| Maximum | 202 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 17 |
| Maximum | 202 |
| Range | 201 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 8.818852824 |
|---|---|
| Coefficient of variation (CV) | 1.541513154 |
| Kurtosis | 187.4827702 |
| Mean | 5.720906631 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 10.67296559 |
| Sum | 16911 |
| Variance | 77.77216513 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2 | 783 | |
| 3 | 498 | |
| 4 | 387 | |
| 5 | 237 | 8.0% |
| 1 | 188 | 6.4% |
| 6 | 175 | 5.9% |
| 7 | 136 | 4.6% |
| 8 | 99 | 3.3% |
| 9 | 67 | 2.3% |
| 10 | 55 | 1.9% |
| Other values (44) | 331 |
| Value | Count | Frequency (%) |
| 1 | 188 | 6.4% |
| 2 | 783 | |
| 3 | 498 | |
| 4 | 387 | |
| 5 | 237 | 8.0% |
| 6 | 175 | 5.9% |
| 7 | 136 | 4.6% |
| 8 | 99 | 3.3% |
| 9 | 67 | 2.3% |
| 10 | 55 | 1.9% |
| Value | Count | Frequency (%) |
| 202 | 1 | |
| 199 | 1 | |
| 123 | 1 | |
| 97 | 1 | |
| 91 | 2 | |
| 85 | 1 | |
| 72 | 1 | |
| 62 | 2 | |
| 60 | 1 | |
| 57 | 1 |
| Distinct | 1577 |
|---|---|
| Distinct (%) | 53.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1381.422869 |
| Minimum | 1 |
|---|---|
| Maximum | 177148 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 93 |
| Q1 | 267 |
| median | 554.5 |
| Q3 | 1211.75 |
| 95-th percentile | 3862.5 |
| Maximum | 177148 |
| Range | 177147 |
| Interquartile range (IQR) | 944.75 |
Descriptive statistics
| Standard deviation | 4996.234518 |
|---|---|
| Coefficient of variation (CV) | 3.616730714 |
| Kurtosis | 570.0613135 |
| Mean | 1381.422869 |
| Median Absolute Deviation (MAD) | 360.5 |
| Skewness | 19.68192671 |
| Sum | 4083486 |
| Variance | 24962359.36 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 88 | 9 | 0.3% |
| 150 | 9 | 0.3% |
| 260 | 9 | 0.3% |
| 200 | 9 | 0.3% |
| 240 | 9 | 0.3% |
| 306 | 8 | 0.3% |
| 84 | 8 | 0.3% |
| 272 | 8 | 0.3% |
| 246 | 8 | 0.3% |
| 360 | 8 | 0.3% |
| Other values (1567) | 2871 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 2 | 2 | |
| 12 | 2 | |
| 16 | 3 | |
| 17 | 1 | < 0.1% |
| 18 | 1 | < 0.1% |
| 20 | 1 | < 0.1% |
| 23 | 1 | < 0.1% |
| 25 | 2 | |
| 26 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 177148 | 1 | |
| 69993 | 1 | |
| 66368 | 1 | |
| 64493 | 1 | |
| 64124 | 1 | |
| 52259 | 1 | |
| 52013 | 1 | |
| 40207 | 1 | |
| 39984 | 1 | |
| 36978 | 1 |
| Distinct | 453 |
|---|---|
| Distinct (%) | 15.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 117.9333559 |
| Minimum | 1 |
|---|---|
| Maximum | 7599 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 28 |
| median | 64 |
| Q3 | 130.25 |
| 95-th percentile | 369.5 |
| Maximum | 7599 |
| Range | 7598 |
| Interquartile range (IQR) | 102.25 |
Descriptive statistics
| Standard deviation | 259.4387994 |
|---|---|
| Coefficient of variation (CV) | 2.199876341 |
| Kurtosis | 362.2230087 |
| Mean | 117.9333559 |
| Median Absolute Deviation (MAD) | 42 |
| Skewness | 15.86797052 |
| Sum | 348611 |
| Variance | 67308.49065 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 28 | 51 | 1.7% |
| 15 | 36 | 1.2% |
| 35 | 35 | 1.2% |
| 11 | 33 | 1.1% |
| 27 | 33 | 1.1% |
| 24 | 32 | 1.1% |
| 20 | 32 | 1.1% |
| 19 | 32 | 1.1% |
| 25 | 32 | 1.1% |
| 29 | 31 | 1.0% |
| Other values (443) | 2609 |
| Value | Count | Frequency (%) |
| 1 | 5 | 0.2% |
| 2 | 13 | |
| 3 | 16 | |
| 4 | 15 | |
| 5 | 26 | |
| 6 | 30 | |
| 7 | 21 | |
| 8 | 25 | |
| 9 | 23 | |
| 10 | 30 |
| Value | Count | Frequency (%) |
| 7599 | 1 | |
| 5337 | 1 | |
| 5095 | 1 | |
| 4222 | 1 | |
| 2630 | 1 | |
| 2326 | 1 | |
| 1905 | 1 | |
| 1806 | 1 | |
| 1577 | 1 | |
| 1487 | 1 |
| Distinct | 2953 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.12601685 |
| Minimum | 2.25375 |
|---|---|
| Maximum | 4453.43 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 2.25375 |
|---|---|
| 5-th percentile | 5.040963463 |
| Q1 | 13.45986335 |
| median | 18.22775 |
| Q3 | 25.28530327 |
| 95-th percentile | 87.30529872 |
| Maximum | 4453.43 |
| Range | 4451.17625 |
| Interquartile range (IQR) | 11.82543992 |
Descriptive statistics
| Standard deviation | 120.096576 |
|---|---|
| Coefficient of variation (CV) | 3.625445721 |
| Kurtosis | 802.2345013 |
| Mean | 33.12601685 |
| Median Absolute Deviation (MAD) | 5.982908497 |
| Skewness | 25.0178421 |
| Sum | 97920.50581 |
| Variance | 14423.18757 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 16.408 | 2 | 0.1% |
| 15 | 2 | 0.1% |
| 18.375 | 2 | 0.1% |
| 18.15222222 | 1 | < 0.1% |
| 17.07774194 | 1 | < 0.1% |
| 20.51104167 | 1 | < 0.1% |
| 149.025 | 1 | < 0.1% |
| 21.75945946 | 1 | < 0.1% |
| 12.949 | 1 | < 0.1% |
| 13.92736842 | 1 | < 0.1% |
| Other values (2943) | 2943 |
| Value | Count | Frequency (%) |
| 2.25375 | 1 | |
| 2.47505618 | 1 | |
| 2.522716049 | 1 | |
| 2.755294118 | 1 | |
| 2.766153846 | 1 | |
| 2.8167 | 1 | |
| 2.825576923 | 1 | |
| 2.86284153 | 1 | |
| 2.875678322 | 1 | |
| 2.905330661 | 1 |
| Value | Count | Frequency (%) |
| 4453.43 | 1 | |
| 3202.92 | 1 | |
| 1687.2 | 1 | |
| 1102.36 | 1 | |
| 952.9875 | 1 | |
| 859.44 | 1 | |
| 663.2571429 | 1 | |
| 651.1683333 | 1 | |
| 624.4 | 1 | |
| 615.75 | 1 |
| Distinct | 1254 |
|---|---|
| Distinct (%) | 42.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -67.18966855 |
| Minimum | -366 |
|---|---|
| Maximum | -1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 2956 |
| Negative (%) | 100.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | -366 |
|---|---|
| 5-th percentile | -200 |
| Q1 | -85.33333333 |
| median | -48.5 |
| Q3 | -25.88928571 |
| 95-th percentile | -7.96875 |
| Maximum | -1 |
| Range | 365 |
| Interquartile range (IQR) | 59.44404762 |
Descriptive statistics
| Standard deviation | 63.38252873 |
|---|---|
| Coefficient of variation (CV) | -0.9433374222 |
| Kurtosis | 4.955966636 |
| Mean | -67.18966855 |
| Median Absolute Deviation (MAD) | 26.35416667 |
| Skewness | -2.072623922 |
| Sum | -198612.6602 |
| Variance | 4017.344948 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -14 | 24 | 0.8% |
| -4 | 22 | 0.7% |
| -70 | 21 | 0.7% |
| -7 | 20 | 0.7% |
| -35 | 18 | 0.6% |
| -49 | 18 | 0.6% |
| -21 | 17 | 0.6% |
| -46 | 17 | 0.6% |
| -11 | 17 | 0.6% |
| -5 | 16 | 0.5% |
| Other values (1244) | 2766 |
| Value | Count | Frequency (%) |
| -366 | 1 | < 0.1% |
| -365 | 1 | < 0.1% |
| -363 | 1 | < 0.1% |
| -362 | 1 | < 0.1% |
| -357 | 2 | |
| -356 | 1 | < 0.1% |
| -355 | 2 | |
| -352 | 1 | < 0.1% |
| -351 | 2 | |
| -350 | 3 |
| Value | Count | Frequency (%) |
| -1 | 16 | |
| -1.5 | 1 | < 0.1% |
| -2 | 13 | |
| -2.5 | 1 | < 0.1% |
| -2.601398601 | 1 | < 0.1% |
| -3 | 15 | |
| -3.321428571 | 1 | < 0.1% |
| -3.390909091 | 1 | < 0.1% |
| -3.5 | 2 | 0.1% |
| -4 | 22 |
| Distinct | 1219 |
|---|---|
| Distinct (%) | 41.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1133185025 |
| Minimum | 0.005449591281 |
|---|---|
| Maximum | 17 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 0.005449591281 |
|---|---|
| 5-th percentile | 0.008888888889 |
| Q1 | 0.01633986928 |
| median | 0.0259553127 |
| Q3 | 0.04956870588 |
| 95-th percentile | 1 |
| Maximum | 17 |
| Range | 16.99455041 |
| Interquartile range (IQR) | 0.0332288366 |
Descriptive statistics
| Standard deviation | 0.4076538922 |
|---|---|
| Coefficient of variation (CV) | 3.597416866 |
| Kurtosis | 998.6544668 |
| Mean | 0.1133185025 |
| Median Absolute Deviation (MAD) | 0.01217802703 |
| Skewness | 25.05874785 |
| Sum | 334.9694934 |
| Variance | 0.1661816958 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 196 | 6.6% |
| 0.02777777778 | 17 | 0.6% |
| 0.0625 | 17 | 0.6% |
| 0.02380952381 | 16 | 0.5% |
| 0.08333333333 | 15 | 0.5% |
| 0.09090909091 | 15 | 0.5% |
| 0.03448275862 | 14 | 0.5% |
| 0.02941176471 | 14 | 0.5% |
| 0.07692307692 | 14 | 0.5% |
| 0.02173913043 | 13 | 0.4% |
| Other values (1209) | 2625 |
| Value | Count | Frequency (%) |
| 0.005449591281 | 1 | < 0.1% |
| 0.005464480874 | 1 | < 0.1% |
| 0.005479452055 | 1 | < 0.1% |
| 0.005494505495 | 1 | < 0.1% |
| 0.005586592179 | 2 | |
| 0.005602240896 | 1 | < 0.1% |
| 0.005617977528 | 2 | |
| 0.00566572238 | 1 | < 0.1% |
| 0.005681818182 | 2 | |
| 0.005698005698 | 3 |
| Value | Count | Frequency (%) |
| 17 | 1 | < 0.1% |
| 3 | 1 | < 0.1% |
| 2 | 5 | 0.2% |
| 1.5 | 1 | < 0.1% |
| 1.142857143 | 1 | < 0.1% |
| 1 | 196 | |
| 0.75 | 1 | < 0.1% |
| 0.6666666667 | 3 | 0.1% |
| 0.5401069519 | 1 | < 0.1% |
| 0.5335120643 | 1 | < 0.1% |
| Distinct | 206 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 32.32476319 |
| Minimum | 0 |
|---|---|
| Maximum | 9014 |
| Zeros | 1480 |
| Zeros (%) | 50.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 8 |
| 95-th percentile | 87.25 |
| Maximum | 9014 |
| Range | 9014 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 273.6783149 |
|---|---|
| Coefficient of variation (CV) | 8.466521882 |
| Kurtosis | 672.3503412 |
| Mean | 32.32476319 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 23.49789957 |
| Sum | 95552 |
| Variance | 74899.82004 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1480 | |
| 1 | 169 | 5.7% |
| 2 | 150 | 5.1% |
| 3 | 106 | 3.6% |
| 4 | 90 | 3.0% |
| 6 | 78 | 2.6% |
| 5 | 60 | 2.0% |
| 12 | 50 | 1.7% |
| 9 | 44 | 1.5% |
| 7 | 43 | 1.5% |
| Other values (196) | 686 |
| Value | Count | Frequency (%) |
| 0 | 1480 | |
| 1 | 169 | 5.7% |
| 2 | 150 | 5.1% |
| 3 | 106 | 3.6% |
| 4 | 90 | 3.0% |
| 5 | 60 | 2.0% |
| 6 | 78 | 2.6% |
| 7 | 43 | 1.5% |
| 8 | 42 | 1.4% |
| 9 | 44 | 1.5% |
| Value | Count | Frequency (%) |
| 9014 | 1 | |
| 8004 | 1 | |
| 4427 | 1 | |
| 3217 | 1 | |
| 2878 | 1 | |
| 2256 | 1 | |
| 2022 | 1 | |
| 2012 | 1 | |
| 1594 | 1 | |
| 1534 | 1 |
avg_basket_size
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 1920 |
|---|---|
| Distinct (%) | 65.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 205.6046041 |
| Minimum | 1 |
|---|---|
| Maximum | 6009.333333 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 41.9375 |
| Q1 | 93.5 |
| median | 152.4166667 |
| Q3 | 243.0833333 |
| 95-th percentile | 510.6875 |
| Maximum | 6009.333333 |
| Range | 6008.333333 |
| Interquartile range (IQR) | 149.5833333 |
Descriptive statistics
| Standard deviation | 251.7862728 |
|---|---|
| Coefficient of variation (CV) | 1.224613981 |
| Kurtosis | 154.3071838 |
| Mean | 205.6046041 |
| Median Absolute Deviation (MAD) | 69.85119048 |
| Skewness | 9.478244221 |
| Sum | 607767.2098 |
| Variance | 63396.32716 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 13 | 0.4% |
| 82 | 11 | 0.4% |
| 114 | 10 | 0.3% |
| 140 | 9 | 0.3% |
| 48 | 9 | 0.3% |
| 120 | 9 | 0.3% |
| 91 | 8 | 0.3% |
| 81 | 8 | 0.3% |
| 88 | 8 | 0.3% |
| 103 | 8 | 0.3% |
| Other values (1910) | 2863 |
| Value | Count | Frequency (%) |
| 1 | 2 | |
| 2 | 1 | |
| 3.333333333 | 1 | |
| 5.333333333 | 1 | |
| 5.666666667 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 2 | |
| 8.333333333 | 1 | |
| 11 | 1 |
| Value | Count | Frequency (%) |
| 6009.333333 | 1 | |
| 4282 | 1 | |
| 3906 | 1 | |
| 3224.65 | 1 | |
| 2880 | 1 | |
| 2460.388889 | 1 | |
| 2441 | 1 | |
| 2323.076923 | 1 | |
| 1866.933333 | 1 | |
| 1826.333333 | 1 |
| Distinct | 898 |
|---|---|
| Distinct (%) | 30.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.74430953 |
| Minimum | 0.2 |
|---|---|
| Maximum | 246 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 23.2 KiB |
Quantile statistics
| Minimum | 0.2 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 7.5 |
| median | 13 |
| Q3 | 21.39236111 |
| 95-th percentile | 43 |
| Maximum | 246 |
| Range | 245.8 |
| Interquartile range (IQR) | 13.89236111 |
Descriptive statistics
| Standard deviation | 14.73738397 |
|---|---|
| Coefficient of variation (CV) | 0.8801428301 |
| Kurtosis | 29.55696068 |
| Mean | 16.74430953 |
| Median Absolute Deviation (MAD) | 6.392307692 |
| Skewness | 3.458822346 |
| Sum | 49496.17896 |
| Variance | 217.1904864 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 11 | 52 | 1.8% |
| 12 | 45 | 1.5% |
| 14 | 42 | 1.4% |
| 16 | 41 | 1.4% |
| 5 | 38 | 1.3% |
| 8 | 38 | 1.3% |
| 9 | 38 | 1.3% |
| 13 | 38 | 1.3% |
| 17 | 37 | 1.3% |
| 7.5 | 35 | 1.2% |
| Other values (888) | 2552 |
| Value | Count | Frequency (%) |
| 0.2 | 1 | < 0.1% |
| 0.25 | 2 | 0.1% |
| 0.3181818182 | 1 | < 0.1% |
| 0.3333333333 | 6 | |
| 0.4 | 1 | < 0.1% |
| 0.5 | 10 | |
| 0.5454545455 | 1 | < 0.1% |
| 0.5714285714 | 1 | < 0.1% |
| 0.6176470588 | 1 | < 0.1% |
| 0.625 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 246 | 1 | |
| 173.5 | 1 | |
| 137 | 1 | |
| 126 | 1 | |
| 101 | 1 | |
| 98.5 | 1 | |
| 93.5 | 1 | |
| 93 | 1 | |
| 92 | 1 | |
| 91.33333333 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | customer_id | monetary | recency_days | qtde_invoices | qtde_items | qtde_products | avg_ticket | avg_recency_days | frequency | qtde_returns | avg_basket_size | avg_unique_basket_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 17850 | 5391.21 | 372.00 | 34.00 | 1733.00 | 297.00 | 18.15 | -35.50 | 17.00 | 40.00 | 50.97 | 0.62 |
| 1 | 1 | 13047 | 3232.59 | 56.00 | 9.00 | 1390.00 | 171.00 | 18.90 | -27.25 | 0.03 | 35.00 | 154.44 | 11.67 |
| 2 | 2 | 12583 | 6495.30 | 2.00 | 15.00 | 3796.00 | 221.00 | 29.39 | -23.19 | 0.04 | 50.00 | 253.07 | 7.13 |
| 3 | 3 | 13748 | 938.89 | 95.00 | 5.00 | 415.00 | 27.00 | 34.77 | -92.67 | 0.02 | 0.00 | 83.00 | 4.60 |
| 4 | 4 | 15100 | 876.00 | 333.00 | 3.00 | 80.00 | 3.00 | 292.00 | -8.60 | 0.07 | 22.00 | 26.67 | 0.33 |
| 5 | 5 | 15291 | 4498.02 | 25.00 | 14.00 | 1670.00 | 96.00 | 46.85 | -23.20 | 0.04 | 29.00 | 119.29 | 4.14 |
| 6 | 6 | 14688 | 5558.18 | 7.00 | 21.00 | 3396.00 | 316.00 | 17.59 | -18.30 | 0.06 | 399.00 | 161.71 | 6.52 |
| 7 | 7 | 17809 | 5401.20 | 16.00 | 12.00 | 2006.00 | 58.00 | 93.12 | -35.70 | 0.03 | 41.00 | 167.17 | 3.58 |
| 8 | 8 | 15311 | 60402.22 | 0.00 | 91.00 | 36978.00 | 2326.00 | 25.97 | -4.14 | 0.24 | 474.00 | 406.35 | 6.00 |
| 9 | 9 | 16098 | 1991.71 | 87.00 | 7.00 | 565.00 | 66.00 | 30.18 | -47.67 | 0.02 | 0.00 | 80.71 | 4.71 |
Last rows
| df_index | customer_id | monetary | recency_days | qtde_invoices | qtde_items | qtde_products | avg_ticket | avg_recency_days | frequency | qtde_returns | avg_basket_size | avg_unique_basket_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2946 | 5613 | 17727 | 1060.25 | 15.00 | 1.00 | 645.00 | 66.00 | 16.06 | -6.00 | 1.00 | 6.00 | 645.00 | 66.00 |
| 2947 | 5623 | 17232 | 421.52 | 2.00 | 2.00 | 203.00 | 36.00 | 11.71 | -12.00 | 0.15 | 0.00 | 101.50 | 15.00 |
| 2948 | 5624 | 17468 | 137.00 | 10.00 | 2.00 | 116.00 | 5.00 | 27.40 | -4.00 | 0.40 | 0.00 | 58.00 | 2.50 |
| 2949 | 5635 | 13596 | 692.75 | 5.00 | 2.00 | 395.00 | 161.00 | 4.30 | -7.00 | 0.25 | 0.00 | 197.50 | 65.00 |
| 2950 | 5641 | 14893 | 1196.84 | 9.00 | 2.00 | 642.00 | 69.00 | 17.35 | -2.00 | 0.67 | 0.00 | 321.00 | 34.00 |
| 2951 | 5645 | 12479 | 468.64 | 11.00 | 1.00 | 358.00 | 29.00 | 16.16 | -4.00 | 1.00 | 34.00 | 358.00 | 29.00 |
| 2952 | 5666 | 14126 | 706.13 | 7.00 | 3.00 | 508.00 | 15.00 | 47.08 | -3.00 | 0.75 | 50.00 | 169.33 | 4.67 |
| 2953 | 5672 | 13521 | 1030.48 | 1.00 | 3.00 | 557.00 | 374.00 | 2.76 | -4.50 | 0.30 | 0.00 | 185.67 | 90.67 |
| 2954 | 5682 | 15060 | 281.67 | 8.00 | 3.00 | 187.00 | 100.00 | 2.82 | -1.00 | 1.50 | 0.00 | 62.33 | 22.67 |
| 2955 | 5701 | 12558 | 269.96 | 7.00 | 1.00 | 196.00 | 11.00 | 24.54 | -6.00 | 1.00 | 196.00 | 196.00 | 11.00 |